Enhanced descriptive captioning model for histopathological patches
نویسندگان
چکیده
Abstract The interpretation of medical images into a natural language is developing field artificial intelligence (AI) called image captioning. This integrates two branches which are computer vision and processing. challenging topic that goes beyond object recognition, segmentation, classification since it demands an understanding the relationships between various components in how these objects function as visual representations. content-based retrieval (CBIR) uses captioning model to generate captions for user query image. common architecture systems consists mainly feature extractor subsystem followed by caption generation lingual subsystem. We aim this paper build optimized histopathological stomach adenocarcinoma endoscopic biopsy specimens. For extraction subsystem, we did evaluations; first, tested 5 different models (VGG, ResNet, PVT, SWIN-Large, ConvNEXT-Large) using (LSTM, RNN, bidirectional-RNN) then compare with (LSTM-without augmentation, LSTM-with augmentation BioLinkBERT-Large embedding layer-with augmentation) find accurate one. Second, 3 concatenations pairs (SWIN-Large, PVT_v2_b5, get among them most expressive extracted vector pre-trained compared LSTM both evaluations, select from model. Our experiments showed building system concatenation ConvNEXT-Large PVT_v2_b5 extractor, combined produces best results other combinations.
منابع مشابه
Text-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملSequence to Sequence Model for Video Captioning
Automatically generating video captions with natural language remains a challenge for both the field of nature language processing and computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has proved to be effective in visual interpretation. Based on a recent sequence to sequence model for video captioning, which is designed to learn the temporal structure of the se...
متن کاملStack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...
متن کاملA Novel Mucilage from Ficus glomerata Fruits for Transdermal Patches: Taking Indomethacin as a Model Drug
The present study was performed to explore the matrix property of Ficus glomerata fruit mucilage for making transdermal patches. The mucilage was evaluated for its physicochemical properties. Various transdermal patches of indomethacin were prepared by solvent evaporation technique using different proportions of F. glomerata fruit mucilage. The compatib...
متن کاملOracle Performance for Visual Captioning
The task of associating images and videos with a natural language description has attracted a great amount of attention recently. The state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant improvements. Instead of proposing new models, this work investigates performances that an oracle can obtain...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Multimedia Tools and Applications
سال: 2023
ISSN: ['1380-7501', '1573-7721']
DOI: https://doi.org/10.1007/s11042-023-15884-y